buffer: improve performance of multiple Buffer operations#61871
buffer: improve performance of multiple Buffer operations#61871thisalihassan wants to merge 10 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
d2ba38f to
495feb5
Compare
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
These fast callbacks are non-identical to the conventional callbacks they shadow.
- The existing callbacks validate their argument and throws to JS if invalid, whereas your fast callbacks hard-crash the process. It might be better to validate in the JS layer, then use the same unwrapping logic on both sides.
- Your fast callback cannot have a different return convention to the conventional callback. You will need to remove the return value from the conventional callback.
There was a problem hiding this comment.
Was removing the validation instead of moving it to js intentional?
Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?
There was a problem hiding this comment.
Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?
This is outdated, the fast API doesn't use fallback any more (since Node.js v23.x).
However, any validation in a JS wrapper should be shadowed by a CHECK or DCHECK in the C++ binding.
There was a problem hiding this comment.
SPREAD_BUFFER_ARG already contains CHECK((val)->IsArrayBufferView()) on all Slow & Fast swap methods
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
Fast callbacks should include debug tracking and call tests.
There was a problem hiding this comment.
Thanks for sharing these I will update the code
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61871 +/- ##
==========================================
- Coverage 91.60% 89.68% -1.92%
==========================================
Files 337 676 +339
Lines 140745 206716 +65971
Branches 21802 39585 +17783
==========================================
+ Hits 128925 185400 +56475
- Misses 11595 13449 +1854
- Partials 225 7867 +7642
🚀 New features to boost your workflow:
|
1395d2f to
01ba74f
Compare
src/node_buffer.cc
Outdated
| FastApiCallbackOptions& options) { | ||
| TRACK_V8_FAST_API_CALL("buffer.swap16"); | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); |
There was a problem hiding this comment.
ArrayBufferViewContents is wrong here, as buffer.data() may be a stack-allocated copy of the byte data rather than the data itself. SPREAD_BUFFER_ARG is the correct macro to use here, as per the conventional callback.
There was a problem hiding this comment.
@Renegade334 replaced ArrayBufferViewContents with SPREAD_BUFFER_ARG in all three fast swap callbacks
There was a problem hiding this comment.
For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)
See also #60249 (comment)
| } else if (value.length === 1) { | ||
| // Fast path: If `value` fits into a single byte, use that numeric value. | ||
| if (normalizedEncoding === 'utf8') { | ||
| if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') { |
There was a problem hiding this comment.
Currently, ascii behaves exactly like latin1
Unsure if by design or accidentally
There was a problem hiding this comment.
yes, this is safe by design I am just extending the existing single byte numeric optimization to cover ASCII, since the guard already constrains it to the valid ASCII range.
|
Note on toBase64 / toBase64url: I also tried replacing the C++ base64Slice/base64urlSlice bindings with V8's Uint8Array.prototype.toBase64() (similar to the toHex change) but it caused a 35-54% regression across all buffer sizes so I reverted base64/base64url and kept only the toHex optimization which showed a clear +26-37% win. |
|
@thisalihassan toHex doesn't show a win anymore with Instead, it's ~3x slower. See nodejs/nbytes#12 |
|
Hi @ChALkeR thanks for flagging I was not aware. I benchmarked the nibble approach locally and it's indeed a much bigger win (~3x vs my ~30% with toHex). Reverted the toHex path entirely the other changes in this PR are unaffected. Should I include the nbytes nibble HexEncode optimization in this PR or keep them as separate PRs?
|
Hi @ChALkeR I see that your changes landed, congratualtions on that huge win. Is there benchmark-ci that we can run on this PR? I can rebase it |
|
I re-ran the benchmark against the latest main (which contains the nibble improvement) and found out there are few regressions caused by my code:
I tried fixing this regression but unavoidable Attaching Details below: buffer-benchmark-all-rebased.csv -- raw benchmark CSV |
|
compare-R-output.txt-- full output from Rscript benchmark/compare.R (shared as a file to avoid cluttering the PR comment). |
59f5d09 to
48abfc5
Compare
|
PS: I resolved some conflicts |
|
Hi @ChALkeR @anonrig @Renegade334 is this PR ready to land? can you also please check the latest benchmarks i have posted, I have compiled PDF for this benchmark in one my comments above for more detail
|
Qard
left a comment
There was a problem hiding this comment.
Generally LGTM, but one small nit.
Failed to start CI⚠ Commits were pushed since the last approving review: ⚠ - buffer: improve performance of multiple Buffer operations ⚠ - buffer: address PR feedback ⚠ - buffer: revert toHex in favour of nbytes HexEncode update ⚠ - resolve feedback ⚠ - resolve feedback on fast callbacks ⚠ - added comments and updated tests ⚠ - resolve feedback ✘ Refusing to run CI on potentially unsafe PRhttps://github.com/nodejs/node/actions/runs/22977319444 |
|
CI: https://ci.nodejs.org/job/node-test-pull-request/71839/ Results |
|
|
||
| function fromArrayLike(obj) { | ||
| if (obj.length <= 0) | ||
| const { length } = obj; |
There was a problem hiding this comment.
Why not move this to line 553? Can you also add a JSDoc?
There was a problem hiding this comment.
@anonrig you mean something like when you said move to line 553?
// Before
fromArrayLike(obj);
// After
fromArrayLike(obj, obj.length);
lib/buffer.js
Outdated
| if (length === undefined && typeof offset === 'string') { | ||
| encoding = offset; | ||
| length = this.length; | ||
| length = len; |
There was a problem hiding this comment.
This is extremely confusing. Please use more readable variable names.
| const data = new Array(len); | ||
| for (let i = 0; i < len; ++i) | ||
| data[i] = this[i]; |
There was a problem hiding this comment.
This can be simplified into a single line.
Array.from({ length: this.length }, (_, i) => this[i]);
There was a problem hiding this comment.
Classic for loop is likely more performant, no?
|
|
||
| Buffer.prototype.swap16 = function swap16() { | ||
| // For Buffer.length < 128, it's generally faster to | ||
| // For Buffer.length <= 32, it's generally faster to |
There was a problem hiding this comment.
Can you add a reference link to this pull-request above this line
| void Swap16(const FunctionCallbackInfo<Value>& args) { | ||
| Environment* env = Environment::GetCurrent(args); | ||
| THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]); | ||
| DCHECK(args[0]->IsArrayBufferView()); |
There was a problem hiding this comment.
Although I agree that DCHECK is sufficient here, we always use non-debug CHECK's. If we want to replace this, we should do it in all of the codebase.
| DCHECK(args[0]->IsArrayBufferView()); | |
| CHECK(args[0]->IsArrayBufferView()); |
There was a problem hiding this comment.
we already haveSPREAD_BUFFER_ARG for Check thenDCheck looks redudand want me to remove this?
| FastApiCallbackOptions& options) { | ||
| TRACK_V8_FAST_API_CALL("buffer.swap16"); | ||
| HandleScope scope(options.isolate); | ||
| SPREAD_BUFFER_ARG(buffer_obj, ts_obj); |
There was a problem hiding this comment.
This fast api call is missing DCHECK(buffer_obj->IsArrayBufferView());
There was a problem hiding this comment.
SPREAD_BUFFER_ARG already do CHECK((val)->IsArrayBufferView()); which is why I didn't add DCheck here
| void Swap32(const FunctionCallbackInfo<Value>& args) { | ||
| Environment* env = Environment::GetCurrent(args); | ||
| THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]); | ||
| DCHECK(args[0]->IsArrayBufferView()); |
There was a problem hiding this comment.
| DCHECK(args[0]->IsArrayBufferView()); | |
| CHECK(args[0]->IsArrayBufferView()); |
| FastApiCallbackOptions& options) { | ||
| TRACK_V8_FAST_API_CALL("buffer.swap32"); | ||
| HandleScope scope(options.isolate); | ||
| SPREAD_BUFFER_ARG(buffer_obj, ts_obj); |
There was a problem hiding this comment.
check is arraybufferview is missing.
There was a problem hiding this comment.
SPREAD_BUFFER_ARG already do CHECK((val)->IsArrayBufferView()); which is why I didn't add DCheck here
| FastApiCallbackOptions& options) { | ||
| TRACK_V8_FAST_API_CALL("buffer.swap64"); | ||
| HandleScope scope(options.isolate); | ||
| SPREAD_BUFFER_ARG(buffer_obj, ts_obj); |
There was a problem hiding this comment.
check is arraybufferview is missing
There was a problem hiding this comment.
same as above
| void Swap64(const FunctionCallbackInfo<Value>& args) { | ||
| Environment* env = Environment::GetCurrent(args); | ||
| THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]); | ||
| DCHECK(args[0]->IsArrayBufferView()); |
There was a problem hiding this comment.
| DCHECK(args[0]->IsArrayBufferView()); | |
| CHECK(args[0]->IsArrayBufferView()); |
- copyBytesFrom: calculate byte offsets directly instead of
slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions
- Guard ensureUint8ArrayToHex against --no-js-base-64 flag by falling back to C++ hexSlice when toHex is unavailable - Remove THROW_AND_RETURN_UNLESS_BUFFER and return value from slow Swap16/32/64 to match fast path conventions (JS validates) - Add TRACK_V8_FAST_API_CALL to FastSwap16/32/64 - Add test/parallel/test-buffer-swap-fast.js for fast API verification
Remove V8 Uint8Array.prototype.toHex() path for Buffer.toString('hex')
in favour of the upcoming nbytes HexEncode improvement (nodejs/nbytes#12)
which is ~3x faster through the existing C++ hexSlice path.
Refs: nodejs/nbytes#12
Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
1d16ad4 to
f1d938b
Compare




Multiple performance improvements to Buffer operations, verified with benchmarks (15-30 runs, comparing old vs new binaries built from same tree).
Buffer.copyBytesFrom()(+100-210%)Avoid intermediate
TypedArrayPrototypeSliceallocation by calculating byte offsets directly into the source TypedArray's underlying ArrayBuffer.Buffer.prototype.fill("t", "ascii")(+26-37%)ASCII
indexOf(+14-46%)Call
indexOfStringdirectly for ASCII encoding instead of first converting the search value to a Buffer viafromStringFastand then callingindexOfBuffer. ASCII and Latin-1 share the same byte values for characters 0-127.swap16/32/64(+3-38%)Add V8 Fast API C++ functions (
FastSwap16/32/64) alongside the existing slow path. Largest gains at len=256 (+35%).Benchmark results
Key results (15-30 runs,
***= p < 0.001):Attaching Details below:
detail.pdf -- visual breakdown
buffer-benchmark-all-rebased.csv -- raw benchmark CSV
compare-R-output.txt-- full output